Given a particular embodiment, we propose a novel method (C3PO) that learns policies able to achieve any arbitrary position and pose. Such a policy would allow for easier control, and would be re-useable as a key building block for downstream tasks. The method is two-fold: First, we introduce a novel exploration algorithm that optimizes for uniform coverage, is able to discover a set of achievable states, and investigates its abilities in attaining both high coverage, and hard-to-discover states; Second, we leverage this set of achievable states as training data for a universal goal-achievement policy, a goal-based SAC variant. We demonstrate the trained policy's performance in achieving a large number of novel states. Finally, we showcase the influence of massive unsupervised training of a goal-achievement policy with state-of-the-art pose-based control of the Hopper, Walker, Halfcheetah, Humanoid and Ant embodiments.
translated by 谷歌翻译
深度强化学习(RL)导致了许多最近和开创性的进步。但是,这些进步通常以培训的基础体系结构的规模增加以及用于训练它们的RL算法的复杂性提高,而均以增加规模的成本。这些增长反过来又使研究人员更难迅速原型新想法或复制已发表的RL算法。为了解决这些问题,这项工作描述了ACME,这是一个用于构建新型RL算法的框架,这些框架是专门设计的,用于启用使用简单的模块化组件构建的代理,这些组件可以在各种执行范围内使用。尽管ACME的主要目标是为算法开发提供一个框架,但第二个目标是提供重要或最先进算法的简单参考实现。这些实现既是对我们的设计决策的验证,也是对RL研究中可重复性的重要贡献。在这项工作中,我们描述了ACME内部做出的主要设计决策,并提供了有关如何使用其组件来实施各种算法的进一步详细信息。我们的实验为许多常见和最先进的算法提供了基准,并显示了如何为更大且更复杂的环境扩展这些算法。这突出了ACME的主要优点之一,即它可用于实现大型,分布式的RL算法,这些算法可以以较大的尺度运行,同时仍保持该实现的固有可读性。这项工作提出了第二篇文章的版本,恰好与模块化的增加相吻合,对离线,模仿和从演示算法学习以及作为ACME的一部分实现的各种新代理。
translated by 谷歌翻译
Machine learning model development and optimisation can be a rather cumbersome and resource-intensive process. Custom models are often more difficult to build and deploy, and they require infrastructure and expertise which are often costly to acquire and maintain. Machine learning product development lifecycle must take into account the need to navigate the difficulties of developing and deploying machine learning models. evoML is an AI-powered tool that provides automated functionalities in machine learning model development, optimisation, and model code optimisation. Core functionalities of evoML include data cleaning, exploratory analysis, feature analysis and generation, model optimisation, model evaluation, model code optimisation, and model deployment. Additionally, a key feature of evoML is that it embeds code and model optimisation into the model development process, and includes multi-objective optimisation capabilities.
translated by 谷歌翻译
The Graph Protocol indexes historical blockchain transaction data and makes it available for querying. As the protocol is decentralized, there are many independent Indexers that index and compete with each other for serving queries to the Consumers. One dimension along which Indexers compete is pricing. In this paper, we propose a bandit-based algorithm for maximization of Indexers' revenue via Consumer budget discovery. We present the design and the considerations we had to make for a dynamic pricing algorithm being used by multiple agents simultaneously. We discuss the results achieved by our dynamic pricing bandits both in simulation and deployed into production on one of the Indexers operating on Ethereum. We have open-sourced both the simulation framework and tools we created, which other Indexers have since started to adapt into their own workflows.
translated by 谷歌翻译
This thesis introduces quantum natural language processing (QNLP) models based on a simple yet powerful analogy between computational linguistics and quantum mechanics: grammar as entanglement. The grammatical structure of text and sentences connects the meaning of words in the same way that entanglement structure connects the states of quantum systems. Category theory allows to make this language-to-qubit analogy formal: it is a monoidal functor from grammar to vector spaces. We turn this abstract analogy into a concrete algorithm that translates the grammatical structure onto the architecture of parameterised quantum circuits. We then use a hybrid classical-quantum algorithm to train the model so that evaluating the circuits computes the meaning of sentences in data-driven tasks. The implementation of QNLP models motivated the development of DisCoPy (Distributional Compositional Python), the toolkit for applied category theory of which the first chapter gives a comprehensive overview. String diagrams are the core data structure of DisCoPy, they allow to reason about computation at a high level of abstraction. We show how they can encode both grammatical structures and quantum circuits, but also logical formulae, neural networks or arbitrary Python code. Monoidal functors allow to translate these abstract diagrams into concrete computation, interfacing with optimised task-specific libraries. The second chapter uses DisCopy to implement QNLP models as parameterised functors from grammar to quantum circuits. It gives a first proof-of-concept for the more general concept of functorial learning: generalising machine learning from functions to functors by learning from diagram-like data. In order to learn optimal functor parameters via gradient descent, we introduce the notion of diagrammatic differentiation: a graphical calculus for computing the gradients of parameterised diagrams.
translated by 谷歌翻译
As aerial robots are tasked to navigate environments of increased complexity, embedding collision tolerance in their design becomes important. In this survey we review the current state-of-the-art within the niche field of collision-tolerant micro aerial vehicles and present different design approaches identified in the literature, as well as methods that have focused on autonomy functionalities that exploit collision resilience. Subsequently, we discuss the relevance to biological systems and provide our view on key directions of future fruitful research.
translated by 谷歌翻译
Compared with model-based control and optimization methods, reinforcement learning (RL) provides a data-driven, learning-based framework to formulate and solve sequential decision-making problems. The RL framework has become promising due to largely improved data availability and computing power in the aviation industry. Many aviation-based applications can be formulated or treated as sequential decision-making problems. Some of them are offline planning problems, while others need to be solved online and are safety-critical. In this survey paper, we first describe standard RL formulations and solutions. Then we survey the landscape of existing RL-based applications in aviation. Finally, we summarize the paper, identify the technical gaps, and suggest future directions of RL research in aviation.
translated by 谷歌翻译
Recent mean field interpretations of learning dynamics in over-parameterized neural networks offer theoretical insights on the empirical success of first order optimization algorithms in finding global minima of the nonconvex risk landscape. In this paper, we explore applying mean field learning dynamics as a computational algorithm, rather than as an analytical tool. Specifically, we design a Sinkhorn regularized proximal algorithm to approximate the distributional flow from the learning dynamics in the mean field regime over weighted point clouds. In this setting, a contractive fixed point recursion computes the time-varying weights, numerically realizing the interacting Wasserstein gradient flow of the parameter distribution supported over the neuronal ensemble. An appealing aspect of the proposed algorithm is that the measure-valued recursions allow meshless computation. We demonstrate the proposed computational framework of interacting weighted particle evolution on binary and multi-class classification. Our algorithm performs gradient descent of the free energy associated with the risk functional.
translated by 谷歌翻译
语义搜索是一项重要的任务,目的是从数据库中找到相关索引以进行查询。它需要一个可以正确学习句子语义的检索模型。基于变压器的模型由于其出色的学习语义表示能力而被广泛用作检索模型。同时,还提出了许多适合它们的正则化方法。在本文中,我们提出了一种新的正则化方法:正则化对比度学习,可以帮助基于变压器的模型学习更好地表示句子。首先,它为每个句子增强了几个不同的语义表示,然后将它们作为监管机构的对比目标。这些对比调节器可以克服过度拟合的问题并减轻各向异性问题。我们首先使用优于预训练的模型Sroberta对7个语义搜索基准测试进行评估。结果表明,我们的方法更有效地学习了出色的句子表示。然后,我们评估具有长期查询和索引的2个具有挑战性的FAQ数据集,咳嗽和FAQIR。我们的实验结果表明,我们的方法表现优于基线方法。
translated by 谷歌翻译
Covid-19幸存者中很大一部分经历了经常影响日常生活的持续多系统症状,这种疾病被称为SARS-COV-2感染的长期或急性后静脉曲张。但是,识别长期的卷文章是具有挑战性的,因为文章是指使用各种较少常见的术语或根本不使用命名的条件。我们开发了一个迭代的人类机器学习框架,旨在有效利用可用的数据并最有效地利用人类标签。具体而言,我们的方法将数据编程与主动学习结合到了强大的集合模型中。在保留集上评估我们的模型表明了其他方法的灵敏度的三倍。我们将模型应用于PubMed来创建长期的共同集合,并证明(1)最长的卷vid文章在命名该条件时并不是用任何名称(2)来指代长的covid,在生物医学文献中最常使用的名称是长的,并且(3)长互联物与各种身体系统中的疾病有关。长期COVID系列每周更新,可在Litcovid门户网站上进行在线搜索:https://www.ncbi.nlm.nih.gov/research/coronavirus/docsum/docsum?filters=e_condition.longcondition.longcovid.longcovid
translated by 谷歌翻译